Arrival Time Statistics in Global Disease Spread 
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Metapopulation models describing cities with different populations coupled by the travel of in- 
dividuals are of great importance in the understanding of disease spread on a large scale. An 
important example is the Rvachev-Longini model [Math. Biosci. 75, 3-22 (1985)] which is widely 
used in computational epidemiology. Few analytical results are however available and in particular 
little is known about paths followed by epidemics and disease arrival times. We study the arrival 
time of a disease in a city as a function of the starting seed of the epidemics. We propose an ana- 
lytical Ansatz, test it in the case of a spreading on the world wide air transportation network, and 
show that it predicts accurately the arrival order of a disease in world-wide cities. 
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In modern societies, individuals can easily travel over a 
wide range of spatial and temporal scales. The intercon- 
nections of areas and populations through various means 
of transport have important effects on the geographical 
spread of epidemics. In particular, the structure and 
the different complexity levels of the air-transportation 
network are responsible for the heterogeneous and seem- 
ingly erratic outbreak patterns observed in the worldwide 
propagation of diseases [l} as recently documented for 
SARS Hi]. In order to describe such a complex phe- 
nomenon and to obtain powerful numerical forecasting 
tools, different levels of description are possible, ranging 
from a simple global mean-field to detailed agent-based 
simulations 0, [3, H, 0, H, B that recreate entire pop- 
ulations and their dynamics at the scale of the single 
individual p^ . 

At large scale, such as the world-wide level, a very im- 
portant class of models in modern epidemiology are the 
so-called metapopulation models which use a description 
at two levels by dividing the global population into inter- 
connected subpopulations. Within each subpopulation, 
a mean-field like model of epidemic spreading is used, 
while the spread from one subpopulation to another is 
due to the travel of individuals. Agents of each subpopu- 
lation can be in various states (healthy, infectious, recov- 
ered...), change state by contact with other agents, and 
diffuse on the transportation network between subpopu- 
lations. Metapopulation models can thus be considered 
as reaction-diffusion processes, which opens very interest- 
ing perspectives and issues fll] within the global frame- 
work of dynamical phenomena occurring on complex net- 
works [H, [H, [ll, [iHl ■ For the description of worldwide 
epidemic spreading, the subpopulations are cities con- 
nected by a transportation network in which links corre- 
spond to the existence of passenger fiows described by the 
worldwide air-transportation network (WAN) . The WAN 
represents a major channel for the worldwide spread of 
infectious diseases [l], Q and its complex, heterogeneous 
features at various levels (degree distribution, traffic. 



populations) have recently been characterized [1^, li3| • 

In this Letter, we focus, in the framework of such 
metapopulation models, on the issue of the arrival time 
in a city of the first infectious individual. In particular, 
we study how this time depends on the origin of the dis- 
ease and on the network characteristics. This problem is 
more complex than the one of random walks on complex 
networks jl8| , since the number of infectious individuals 
diffusing on the network is constantly evolving due to the 
inner-city epidemic dynamics. We also note that refer- 
ences [1^, 1201 were also concerned with the arrival time 
problem for an epidemic spreading on a complex network, 
but in a different framework: each network node was an 
individual (susceptible or infectious), while in our case 
each node represents a whole subpopulation. After the 
precise definition of the model, we will first consider the 
simple case of a onc-dimensional topology for the trans- 
portation network in order to gain analytical insights into 
this problem. This will allow us to propose an analyti- 
cal form for the arrival time in arbitrary networks. We 
then test this form in the case of the WAN, by simulat- 
ing numerically a stochastic spreading phenomenon on 
the network, and show that we can indeed predict with a 
good accuracy the spreading phenomenon and the arrival 
order of a disease in various cities at a world-wide level. 

While the precise model describing the epidemic 
spreading at the subpopulation level could be refined 
at will in order to describe a particular disease, we are 
here interested in generic and fundamental aspects of the 
metapopulation modeling approach. We therefore re- 
strict our study to a simple SI disease model in which 
individuals are either healthy (susceptible, S) or can be- 
come infectious (I) if in contact with an infectious in- 
dividual. The Rvachev-Longini SI model [2l[ describes 
the evolution of the number of infectious /j (t) individuals 
(and also of Si{t)) in each city i through 

dth^K{{x,}) + ni{i,}) , (1) 

where the first term K of the right hand side describes 
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the (epidemic) reaction process inside each subpopula- 
tion (city), due to the interaction of individuals in the 
various possible states. In our case X G {S, 1} (we 
have checked that more involved models, such as SIS 
or SIR, give consistent results [22,]) and the standard 
homogeneous mixing assumption in each city gives [3]: 
K{{Xi}) = XIi{Ni — Ii)/Ni, where Ni is the population 
of city i and A the spreading rate. The second term 
represents the evolution due to the arrival or departure of 
infectious individuals from or to other cities and is deter- 
mined by passenger flows on the transportation network. 
This model therefore considers a simplified mechanistic 
approach with a widely used markovian assumption in 
which individuals are not labeled according to their orig- 
inal subpopulation, and at each time step the same trav- 
eling probability applies to all individuals in the subpop- 
ulation, without memory of their origin [l], [1, [2lj. De- 
noting by Wij the average number of passengers traveling 
from i to j per unit of time (wij = if there is no direct 
connection), the probability per unit time that an indi- 
vidual travels from city i to city j is then given by Wij /Ni. 
The full metapopulation model is therefore described by 



occur as instantaneous jumps of probability p 



N, 



dth = \h{t) 



This original formulation considers only expectation val- 
ues, which can take continuous values, so that "fractions" 
of infectious individuals can travel and infect neighboring 
cities arbitrarily fast [1^. To investigate arrival times, 
one therefore needs to take into account the inherent 
stochasticity of the spreading. We thus consider in all 
our numerical simulations the stochastic generalization 
described in P, |3| where the number of individuals trav- 
eling on each connection is an integer variable randomly 
extracted at each time step of length At, with average 
Atwijii/Ni (in the numerical simulations we will use 
At = 1 day); for simplicity we keep the endogenous 
growth deterministic since we are mainly concerned with 
the effect of travel, but we have checked that inclusion of 
stochastic effects as in [l] do not change our results p^ . 
Note that in real cases such as the WAN, most weights 
are symmetric {wij = wji) [T^ but the probabilities of 
travel from one city to another are not since they depend 
on the populations of the various cities: the travel effec- 
tively occurs as a random diffusion with non-symmetric 
rates on the transportation network. The topological dis- 
tance thus does not contain all the information needed to 
characterize such a process. Moreover, since most trans- 
portation networks are small-world networks, many cities 
lie at the same topological distance from a given seed, but 
will potentially be reached at very different times. 

Before turning to numerical simulations of the de- 
scribed model, we present an analytical approach to the 
determination of arrival times. Let us first consider the 
simple case of two cities (0 and 1), with populations iVo, 
A^i which are connected by a passenger flux wqi = w. We 
assume that at t = 0, there are = 1 infectious people 
in the city 0. Let us first consider that the travel events 



at discretized times, in units of At. The probability that 
the time of arrival ti of the epidemic in the city 1 is equal 
to t — nAt is then 



Pd{t^^nAt)= l-(l-p) 



\l"{nAt) 



(lAt) 



(3) 

In order to obtain the density probability P{t) of the 
arrival time in the city 1 we consider the limit At — > 0, 
using the following assumptions: (i) /"(t) ^ N, which is 
realistic for usual diseases, in which only small fractions 
of the population are infectious; (ii) the continuous limit 
for /°(t) can be used which reads j ^ (ti). Within these 
assumptions, we obtain 



P(t)dt = ^e^'-^'^'dt 

Nq 



(4) 



(the last assumption then reads 1 ^ ln(-^^)). We 
recognize in ^ a Gumbel distribution with average 
(ti) — ■i-[ln(-^^) — 7], where 7 is the Euler constant. 
The variance is Var{ti) — and does not depend of 
(the non physical contribution of the negative val- 
ues of t in the distribution has to be negligible which is 
satisfied if J^^P{t)dt = < 1). Within these as- 
sumptions, we obtain a gooci agreement between results 
of numerical simulations using discretized travel events 
[l| and the theory which uses continuous approximations 
(see Fig. ID). 
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FIG. 1: Two cities model. Arrival time ti distribution com- 
puted by numerical simulation and compared with the result 
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lO"-" (line). Inset: Same for ^ = 10 



We now consider the case of a one-dimensional line 
of cities connected by passenger fluxes of random inten- 
sity. We assume that the spreading process starts at 
city and we denote by t„ the arrival time in the city 
n. The quantities having the same unit as t„ are 1/A 
and Ni /wi , where Wi is the number of passengers travel- 
ing from i to i + 1 per unit time. Dimensional analysis 
then implies that the probability distribution of the adi- 
mensional quantity At„ must be a function of the other 
adimensional quantities which are the Wi/{lambdaNi): 
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P{\tn) = Gni^tn, where G„ is an unknown func- 

tion. One can write i„ as a sum of random variables, 
= U — ii-i which are however correlated since each 
local infection process depends on the history of the epi- 
demics in all previously infected cities. While a complete 
study of P{Xtn) is left for future work numerical 
simulations of the spreading show (Fig. [2|) that it obeys 
important invariance properties. For heterogeneous pop- 
ulations and travels {wi and Ni are distributed uniformly 
in [10, 2000] and [10^, 2.10^], respectively), the whole dis- 
tribution is invariant when one replaces (i) all the random 
weights by their geometrical mean mJ = {Yli^o Wi)^^"; (ii) 
all the random populations by their geometrical mean 
N — (n"=o^ NiY/^; (in) all weights by w and all popu- 
lations by N . The ratios of the average times for these 
different sets stay very close to 1, with deviations at most 
of the order of 5%. 

The average arrival time can thus be written as X{tn) = 
F{{^^}) where F(xi, . . . , a;„) is a symmetric function of 
its variables which depends only on the product J| Xi , and 
such that {ti) is the average of the Gumbel distribution 
(III). This leads to the following Ansatz 



Hin) ~ x{n) = In 



n 



(5) 
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FIG. 2: (A-C) Arrival time distribution on a line at city #7, 
from numerical simulations for a fixed random set of popu- 
lations {Ni} and weights {wi\ (Black circles). Red crosses: 
distributions for (A) uniform travel Wi = w, and populations 
{Ni}; (B) uniform populations Ni — N, and weights {wi}; 
(C) uniform populations Ni — N and weights Wi = w. We 
use a small value of n since most real complex networks are 
small- world: any node lies at a small distance from the seed. 

Figure [3] shows that the average arrival time in a city is 
indeed determined by x to a very good extent (while the 
arrival time at a given topological distance from the seed 
can vary a lot) . More quantitatively, x is approximately 
proportional to X{t), which it slightly overestimates since 
we neglect the flow of infectious individuals from n — 2 
to n — 1 with res pec t to the endogenous increase of /„-i 
during [t„-i;t„] [2i|. 




FIG. 3: X{tn) vs. x('^) foi" 5 cities connected on a line, with 
100 different random sets {wi,Ni}. Each point is an average 
over 1, 000 epidemics for each realization of the w's. 



We now consider a generic transportation network be- 
tween the cities. The quantity ([5]) can easily be com- 
puted on any path of length n on the network. While 
the spread can a priori follow multiple paths from one 
city to another, we can reasonably assume that the most 
probable path is the one which minimizes the value of x 
computed on it, leading to the smallest arrival time pos- 
sible (a more refined Ansatz taking into account multiple 
paths does not lead to strong differences in the final re- 
sults d^l). We thus obtain the following Ansatz for the 
arrival time at a city f of a disease starting at node s 



x(s,t) = mm 



\ Wkl J 



■7 



(6) 



where {Psj} is the set of all possible paths connecting s 
to t, and the sum is over the links {k, I) on the paths. In 
other terms, we have introduced a new (non symmetric) 
weight hi(NiX/wij) — 7 on each oriented link (i, j) of the 
network. 

We have simulated, using the model developed in 
and summarized above, a spreading phenomenon on a 
subnetwork of the WAN, composed of the 2, 400 nodes for 
which the populations are larger than 10, 000 inhabitants 
and which corresponds to 98% of the total traffic [2^ . 
The arrival times are computed by solving numerically 
the equations of the Rvachev-Longini model with dis- 
cretized random travel events, and averaging over 1,000 
realizations of the spreading with the same seed (one 
infectious individual in a given city). Figure [4] shows 
the obtained values of X{t) versus x for various initial 
seeds. We observe that the average arrival time is indeed 
determined by the value of x i^i & given city: various 
cities with the same x ^-re reached at the same time by 
the disease propagation. While x quantitatively overesti- 
mates the arrival time, the two quantities are correlated 
strongly enough, in order to obtain with a good confi- 
dence the order of arrival of the disease in different cities. 
More precisely, if we denote Ax{i,j) =| x(j) ~ x(*) 1j 
we show in Fig. [5] the probability fc{Ax) that the ar- 
rival times in one realization of the spread i*-*-* and t^^^^ 
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FIG. 4: (Color online) X{t) versus x on the WAN for diseases 
starting in different cities (whose name is specified in each 
graph). Each red circle corresponds to a city and averages 
are done over f , 000 realizations of the spreading. Crosses are 
an average over cities with the same X- When the initial seed 
is a hub, the average arrival time is larger than x in the first 
reached cities, due to the multiplicity of possible paths [2^ . 




FIG. 5: Fraction of couples of nodes correctly ranked as a 
function of their Ax (circles), in each realization of the spread, 
and cumulative distribution (squares) of the values of Ax (i.e., 
fraction of couples of cities (i, j) with Ax{i,j) > Ax). 



follow the same order as given by x(i) and xO) [i^- 
_ tO))(x(i) - x(j)) > 0]. In other words, is 
the probability that the disease arrival rank for the two 
cities i and j is correctly predicted by x- If '^xihj) is 
equal to 0, no prediction is possible and we indeed obtain 
/c(0) = 0.5. For Ax > 10, almost all node couples are 
correctly ranked. This result has however to be weighted 
by the number of couples with such a large Ax. We 
thus plot on the same figure the cumulative distribution 
p> (Ax) of the number of couples of nodes with a given 
value of Ax- We see for example that approximately 80% 
of the couples of cities have a Ax > 2 and more than 70% 
of these couples are correctly sorted (instead of just 50% 
on average if no information is available). 

From a theoretical point of view, metapopulation mod- 
els go far beyond classical random walks and deserve 
many further theoretical investigations. In this Letter, 
we have proposed an Ansatz for the arrival time of a dis- 
ease in a city, knowing the starting point of the spread. 
This Ansatz is a good approximation and predicts with 
accuracy the arrival order of the disease in the different 
cities, even if they are at the same topological distance 
from the seed [l^ . Containment strategies could use such 
information to target the cities most at risk of rapid infec- 
tion, and therefore deploy limited supplies of vaccine or 
antivirals in an efficient way. Further developments could 
include more sophisticated compartmental or metapopu- 
lation models, and the systematic investigation of various 
structures of complex networks [2^ . Finally, it would be 
interesting to extend this study to other scales, like the 
urban scale, where nodes are locations such as homes, 
offices or malls flo[. 
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